**High Performance Computer Architecture (CS60003)**

**Class Test 1**

**Model Solution**

* Answer all questions neatly. All answers should be concise and to the point.
* **Illegible/ ambiguous answers will be awarded 0 mark.**

1. Consider a simple 5 stage MIPS processor. Assume that it uses a unified data/instruction cache, and **30%** of all instructions issued are of load/store type. The program being executed does not cause any data or control hazards. Determine the increase/decrease in memory bandwidth of the MIPS processor compared to a similar processor that does not use instruction pipelining. Ignore various pipeline hazards and pipeline overheads. (Hint: Memory bandwidth is the number of bytes that need to be accessed per cycle.) **[5]**

**Ans:**

**For the single data/instruction cache: CPI=1+0.3=1.3 -- Award 2 Marks**

**For a processor that does not use pipeline: CPI=5 --- Award 2 Marks**

**Increase in memory bandwidth=5/1.3 =3.84 --- Award 1 Marks**

1. In a certain assembly program, **20%** of the instructions are memory operations. An unpipelined processor took **20** seconds to execute the program. We then increased the main memory size and observed that the average time for execution of memory instructions decreased by **30%.** The execution time of other instructions remained unaffected. What will be the new execution time of the program? **[5]**

**Ans:**

**Amount of improvement for load/stores = 1/.70 = 1.44 -- Award 3 marks**

**﻿Time after improvement = (Time affected by improvement/ Amount of improvement) + Time unaffected by improvement**

**Therefore, Time=20(.20)/1.44+20(.80) = 18.8 sec --- Award 2 marks**

1. Dolphin is a RISC processor. It uses simple bimodal predictors of two-bit saturating counters for branch prediction. Simulation experiments on the Dolphin processor for an important workload indicated that two branch instructions having the following two 32-bit addresses (in binary) execute frequently:

**Address A:1000 1101 1100 0011 0110 1011 0100 1001**

**Address B: 1000 1101 0110 1001 1110 1011 0100 1001**

In the simulation experiment very low branch prediction accuracy was observed for these two branches. Based on an analysis of the simulation data, the processor designers suspect that the above two branch instructions must be conflicting (hashing to the same entry) in the Branch History Table (BHT). At least how many entries must the Branch History Table (BHT) have to prevent these two branches from interfering (conflicting) with each other? In this case, **at least** how much storage space (in bytes) is required for implementing the BPB?  **[3+2]**

**Answer: These two addresses have the lower 15 bits in common.**

**Thus, the index bits would need to at least 16 bits to map these two addresses to different entries in the branch predictor. …. 3 Mark**

**It will require a 64K-entry predictor, which is 64K\*2bits = 16KBs … 2 Mark**

1. Consider processor similar to a 5-stage MIPS processor using 2-bit predictors, each initialized to strongly not taken. The following **for** loop that iterates 100,000 times was executed on the processor. Out of the total number of iterations, in how many iterations the two branches: branch1 and branch2 would be correctly predicted,? Assume that the size of the BHT is extremely large. **[5]**

**for (i=0; i<100,000; i++) {**

**if (i%2==1) { // branch 1: taken if i is odd**

**… … … // do something**

**}**

**} // branch 2: loop**

**Ans: branch 1: 50,000/100,000 …. 2 Mark**

**branch 2: 99,997/100,000 …. 3 Mark**

1. A designer has proposed a hardware optimization for a given processor that would decrease the average CPI by **10%**. Unfortunately, this optimization would result in increasing the cycle time by **10%**. Is this optimization worth implementing? Show the details of your calculations. **[5]**

**CPI(new) = 0.9 \* CPI (old) ////CPI decreased by 10%**

**Execution time(new)= 0.9\*CPI(old) \* 1.1 Cycles (old)**

**= 0.99 \* Execution time(old)**

**Since the new execution time is lower, the optimization is worthwhile**.

1. Assume that a simple MIPS processor with 5 stage instruction pipeline is being used for executing a program. There is no forwarding hardware or any additional hardware to reduce branch delays. The processor is using a static **not taken** predictor. The characteristics of the program being executed are as follows:
   * The program consists of **20%** load type of instructions. In which **50%** of loads are used by next instruction resulting in data hazards. There are no other data hazards.
   * **20%** of the program instructions are branches. Of these **50%** are unconditional branches and the rest are conditional branches. **25%** of conditional branches mispredicted.

What is the average CPI? **[5]**

**Ans:**

**Average CPI = 1+ (0.1)(2) + (0.1)\*2 + (0.1\*0.25) \* 2**

**= 1.45 Award 5 marks**

1. Consider an unpipelined **1 GHz** multicycle processor. It takes **5** cycles to execute memory instructions and **4** cycles to execute ALU and other operations. Assume that the relative frequencies of different types instructions in a very large program being executed are:

* Load/store instructions=**40%**
* Other instructions=**60%,**

Compute the speedup that would be achieved by pipelining the processor similar to a MIPS 5 stage pipeline. Assume that the pipeline overhead would be **0.2ns**. Ignore the effect of various types of hazards. **[5]**

**Solution:**

**Average instruction execution time:**

**unpipelined= 1ns \* (60%\*4+ 40%\*5) =4.4ns**

**Pipelined=1.2ns**

**Speedup=4.4/1.2=3.7 times**

**-----------The End------------**